The Ancient Greek and Latin Dependency Treebanks
نویسندگان
چکیده
This paper describes the development, composition, and several uses of the Ancient Greek and Latin Dependency Treebanks, large collections of Classical texts in which the syntactic, morphological and lexical information for each word is made explicit. To date, over 200 individuals from around the world have collaborated to annotate over 350,000 words, including the entirety of Homer’s Iliad and Odyssey, Sophocles’ Ajax, all of the extant works of Hesiod and Aeschylus, and selections from Caesar, Cicero, Jerome, Ovid, Petronius, Propertius, Sallust and Vergil. While perhaps the most straightforward value of such an annotated corpus for Classical philology is the morphosyntactic searching it makes possible, it also enables a large number of downstream tasks as well, such as inducing the syntactic behavior of lexemes and automatically identifying similar passages between texts.
منابع مشابه
Structured Knowledge for Low-Resource Languages: The Latin and Ancient Greek Dependency Treebanks
We describe here our work in creating treebanks – large collections of syntactically annotated data – for Latin and Ancient Greek. While the treebanks themselves present important datasets for traditional research in philology and linguistics, the layers of structured knowledge they contain (including disambiguated lemma, morphological, and syntactic information for every word) help offset the ...
متن کاملPorting an Ancient Greek and Latin Treebank
We have recently converted a dependency treebank, consisting of ancient Greek and Latin texts, from one annotation scheme to another that was independently designed. This paper makes two observations about this conversion process. First, we show that, despite significant surface differences between the two treebanks, a number of straightforward transformation rules yield a substantial level of ...
متن کاملDiachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek
One easily observable aspect of language variation is the order of words. In human and machine natural language processing, it is often claimed that parsing freeorder languages is more difficult than parsing fixed-order languages. In this study on Latin and Ancient Greek, two wellknown and well-documented free-order languages, we propose syntactic correlates of word order freedom. We apply our ...
متن کاملCorpus Linguistics, Treebanks and the Reinvention of Philology
The fields of corpus and computational linguistics address fundamental goals – and challenge us to rethink the structure – of humanistic research. All work with historical languages is, in some sense, an exercise in corpus linguistics. The Greek and Latin Treebanks illustrate changes in intellectual practice. Linguistic annotation of historical corpora serves a different community and offers a ...
متن کاملWill a Parser Overtake Achilles? First experiments on parsing the Ancient Greek Dependency Treebank
We present a number of experiments on parsing the Ancient Greek Dependency Treebank (AGDT), i.e. the largest syntactically annotated corpus of Ancient Greek currently available (350k words ca). Although the AGDT is rather unbalanced and far from being representative of all genres and periods of Ancient Greek, no attempt has been made so far to perform automatic dependency parsing of Ancient Gre...
متن کامل